Proposition for a sequential accelerator in future general-purpose manycore processors

نویسندگان

  • Pierre Michaud
  • André Seznec
  • Yiannakis Sazeides
چکیده

The number of transistors that can be put on a given silicon area doubles on every technology generation. Consequently, the number of on-chip cores increases quickly, making it possible to build general-purpose processors with hundreds of cores in a near future. However, though having a large number of cores is beneficial for speeding up parallel code sections, it is also important to speed up sequential execution. We argue that it will be possible and desirable to dedicate a large fraction of the chip area and power to high sequential performance. Current processor design styles are restrained by the implicit constraint that a processor core should be able to run continuously; therefore power hungry techniques that would allow very high clock frequencies are not used. The “sequential accelerator” we propose removes the constraint of continuous functioning. The sequential accelerator consists of several cores designed for ultimate instantaneous performance. Those cores are large and power hungry, they cannot run continuously (thermal constraint) and cannot be active simultaneously (power constraint) . A single core is active at any time, inactive cores are power-gated. The execution is migrated periodically to a new core so as to spread the heat generation uniformly over the whole accelerator area, which solves the temperature issue. The ”sequential accelerator” will be a viable solution only if the performance penalty due to migrations can be tolerated. Migration-induced cache misses may incur a significant performance loss. We propose some solutions to alleviate this problem. We also propose a migration method, using integrated thermal sensors, such that the migration interval is variable and depends on the ambient temperature. The migration penalty can be kept negligible as long as the ambient temperature is maintained below a threshold. Key-words: Multicore processor, sequential performance, power, temperature, migration, caches ∗ University of Cyprus in ria -0 04 33 23 4, v er si on 3 10 D ec 2 00 9 Proposition d’accélérateur séquentiel pour les multi-coeurs généralistes futurs Résumé : Le nombre de transistors pouvant être intégrés sur une surface de silicium donnée double à chaque génération technologique. En conséquence, le nombre de coeurs de calcul augmente rapidement, ce qui permettra dans un futur proche de produire des processeurs généralistes comportant plusieurs centaines de coeurs. Disposer d’un grand nombre de coeurs permettra d’accélérer les codes parallèles, mais il est également important d’accélérer les codes séquentiels. Nous pensons qu’il sera possible et souhaitable qu’une large partie de la surface et de la consommation électrique de la puce soit affectée à l’accélération des codes séquentiels. Les processeurs actuels sont, dans leur conception, soumis à la contrainte implicite de devoir fonctionner continument. Ceci empêche l’utilisation de techniques gourmandes en énergie mais qui permettraient d’atteindre des fréquences d’horloge très élevées. L’accélérateur séquentiel que nous proposons permet d’enlever cette contrainte de fonctionnement continu sur les coeurs. L’accélérateur séquentiel consiste en plusieurs coeurs spécialement conçus pour délivrer une performance séquentielle instantanée très élevée. Ces coeurs sont larges et gourmands en énergie, ils ne peuvent pas fonctionner continument (contrainte thermique) ni être actifs simultanément (puissance électrique limitée). Un seul coeur est actif à un instant donné, les coeurs inactifs sont déconnectés de l’alimentation électrique. L’exécution migre périodiquement de coeur en coeur afin d’étaler la génération de chaleur uniformément sur toute la surface de l’accélérateur, ce qui résout le problème de la température. L’accélérateur séquentiel sera une solution viable seulement si la pénalité due aux migrations peut être tolérée. Les défauts de cache causés par les migrations peuvent avoir un impact non négligeable sur la performance. Nous proposons des solutions possibles à ce problème. Nous proposons aussi une méthode de migration utilisant les capteurs thermiques intégrés, où l’intervalle de migration est de longueur variable et dépend de la température ambiante. La pénalité de migration reste négligeable tant que la température ambiante reste en dessous d’un seuil. Mots-clés : Processeur multicoeur, performance séquentielle, puissance, température, migration, caches in ria -0 04 33 23 4, v er si on 3 10 D ec 2 00 9 Proposition for a sequential accelerator 3

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Cache-aware Parallel Programming for Manycore Processors

With rapidly evolving technology, multicore and manycore processors have emerged as promising architectures to benefit from increasing transistor numbers. The transition towards these parallel architectures makes today an exciting time to investigate challenges in parallel computing. The TILEPro64 is a manycore accelerator, composed of 64 tiles interconnected via multiple 8×8 mesh networks. It ...

متن کامل

Stream-based Parallel Computing Methodology and Development Environment for High Performance Manycore Accelerators

The latest supercomputers incorporate a high number of compute units under the form of manycore accelerators. Such accelerators, like GPUs, have integrated processors where a massively high number of threads, in the order of thousands, execute concurrently. Compared to single-CPU throughput performance, they offer higher levels of parallelism. Therefore, they represent an indispensable technolo...

متن کامل

Source-to-source compilation of loop programs for manycore processors

It is widely accepted today that the end of microprocessor performance growth based on increasing clock speeds and instruction-level parallelism (ILP) demands new ways of exploiting transistor densities. Manycore processors (most commonly known as GPGPUs or simply GPUs) provide a viable solution to this performance scaling bottleneck through large numbers of lightweight compute cores and memory...

متن کامل

On the energy efficiency and performance of irregular application executions on multicore, NUMA and manycore platforms

Until the last decade, performance of HPC architectures has been almost exclusively quantified by their processing power. However, energy efficiency is being recently considered as important as raw performance and has become a critical aspect to the development of scalable systems. These strict energy constraints guided the development of a new class of so-called light-weight manycore processor...

متن کامل

Can Broken Multicore Hardware be Mended?

A suggestion is made for mending multicore hardware, which has been diagnosed as broken. 1. THE MULTICORE ERA IS A CONSEQUENCE OF THE STALLING OF THE SINGLE-THREAD PERFORMANCE The multiand many-core (MC) era we have reached was triggered after the beginning of the century by the stalling of single-processor performance. Technology allowed more transistors to be placed on a die, but they could n...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009